Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of VecCorrBijector #246

Merged
merged 179 commits into from
Jun 12, 2023
Merged

Implementation of VecCorrBijector #246

merged 179 commits into from
Jun 12, 2023

Conversation

torfjelde
Copy link
Member

This PR implements a VecCorrBijector, i.e. CorrBijector but where the unconstrained space is treated as a Vector of the correct size rather than working with a Matrix which is actually embedded in a subspace of matrices.

This is particularly when doing stuff like MCMC sampling, where the full matrix representation might give the sampler the false indication that the dimension is d × d instead of (d choose 2) (as is the case here).

with_logabsdet_jacobian(b::VecCorrBijector, x) = transform(b, x), logabsdetjac(b, x)

function transform(::VecCorrBijector, X::AbstractMatrix{<:Real})
w = upper_triangular(parent(cholesky(X).U))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate this:(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not have w = cholesky(X).U and work with a w::UpperTriangular instead of a dense matrix? Tried it locally, no real gain in performance though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not have w = cholesky(X).U and work with a w::UpperTriangular instead of a dense matrix?

Yeah, but this won't work with some of the AD backends (I know, it's super-annoying...). If you have a look at our compat-code for ReverseDiff (I believe), I think you'll see that we have to do some custom stuff to compute the pullback.

Tried it locally, no real gain in performance though.

I don't think we'd expect it to because internally we're iterating over the relevant elements of the matrix anyways, i.e. we're not gaining anything by telling the rest of the computational path that we're actually working on a lower-triangular matrix because it already assumes the given matrix is lower-triangular.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely unrelated, but if it is not 100%-guaranteed that you always end up with an upper triangular matrix when calling cholesky (which I think you can't if AbstractMatrix is supported), it would be better to work with .UL instead of .U (as we already do in other places of Bijectors and other libraries).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 100% guaranteed that it's available though, right? So it's a question of efficiency, not correctness.

The problem here is that

  1. We need to work with a lower-triangular matrix (unless we re-implement _link_chol_lkj to work with vector).
  2. We drop the type-information in the construction of w, hence we don't actually know if it was a uppper- or lower-triangular (in the case where we do cholesky(...).UL).

All in all, it seems we need an additional diversion, e.g.

link_chol_lkj(x::LowerTriangular) = link_chol_lkj(lower_triangular(parent(x)))
link_chol_lkj(x::UpperTriangular) = link_chol_lkj(transpose(upper_triangular(parent(x))))
link_chol_lkj(x::AbstractMatrix) = _link_chol_lkj(x)  # assume it's lower-triangular

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess it's mainly for efficiency reasons. But it's difficult to say if there are other implications as well, e.g., regarding AD. An alternative to .UL would be something like what's used in PDMats: https://github.com/JuliaStats/PDMats.jl/blob/fff131e11e23403931a42f5bfb3384f0d2b114c9/src/chol.jl#L6-L11 That should also be quite efficient and you could continue working with upper-triangular matrices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess it's mainly for efficiency reasons.

https://github.com/JuliaLang/julia/blob/db7971f49912d1abba703345ca6eb43249607f32/stdlib/LinearAlgebra/src/cholesky.jl#L515-L527

But it's difficult to say if there are other implications as well, e.g., regarding AD.

Hmm fair, though IMO something like this seems like it would be a bug with the AD package, no?

An alternative to .UL would be something like what's used in PDMats: https://github.com/JuliaStats/PDMats.jl/blob/fff131e11e23403931a42f5bfb3384f0d2b114c9/src/chol.jl#L6-L11 That should also be quite efficient and you could continue working with upper-triangular matrices.

Me and @harisorgn were just having a chat and we're thinking of replacing upper_triangular(parent(cholesky(X).U)) with

cholesky_upper(x) = upper_triangular(parent(cholesky(X).U))
cholesky_lower(x) = lower_triangular(parent(cholesky(X).L))

to make it less likely that we forget or mess up somewhere.

But we can make it

cholesky_upper(x) = upper_triangular(parent(PDMats.chol_upper(cholesky(X))))
cholesky_lower(x) = lower_triangular(parent(PDMats.chol_lower(cholesky(X))))

But are you sure there's not a good reason for why the default is copy? Of course it's more mem-intensive, but will stuff like LowerTriangular(U') lead to slower computation paths (since you're now working with adjoint(U) rather than something that is actually lower-triangular)? E.g. indexing adjoin(U) surely involves more computations than indexing copy(adjoint(U)).

src/bijectors/corr.jl Outdated Show resolved Hide resolved
@yebai yebai mentioned this pull request Jun 6, 2023
@harisorgn
Copy link
Member

harisorgn commented Jun 6, 2023

but always wrap the matrix in a Symmetric, Hermitian, etc. wrapper first. This will dispatch to optimized methods, avoid checks such as ishermitian, and also yield different paths and results in AD.

Thanks David! This seems to work for our issue. The interface tests for v1.6 now fail on the Hermitian constructor during ReverseDiff. Seems that an rrule for the constructor is accounted for in latest but not in v1.6 (though I couldn't find it explicitly)?

@devmotion
Copy link
Member

My initial guess would be that this is caused by some changes in LinearAlgebra rather than AD (ReverseDiff didn't change Julia compat and doesn't use rrules in general).

@torfjelde
Copy link
Member Author

I messed up and merged the formatting PR into master before this 🤦 So the result was quite a lot of conflicts. I've resolved them now though 👍

@devmotion
Copy link
Member

Regarding cholesky/Hermitian: I assume it works on Julia >= 1.8 due to JuliaLang/julia#44076.

@harisorgn
Copy link
Member

Regarding cholesky/Hermitian: I assume it works on Julia >= 1.8 due to JuliaLang/julia#44076.

Ah yes, thanks once again! Sorry, misread the stack trace on my phone and thought the issue was the Hermitian constructor 😅

How should we handle this then? The Hermitian wrapper before cholesky solves the ForwardDiff issue with the tiny numerical mismatch, but creates an instance of <: Hermitian{....TrackedArray....} that cholesky can't handle in v1.6 .

@torfjelde
Copy link
Member Author

One straight-forward (but hacky) solution is of course:

cholesky_factor(X::ReverseDiff.TrackedArray) = cholesky_factor(cholesky(X))

Also, though I don't see another fix, I'm uncertain if Hermitian is the 100% correct solution since during sampling we can easily run into matrices which end up not being hermitian due to numerical issues. If this happens, it seems to me that we'd want to error rather than just continue, no?

@harisorgn
Copy link
Member

One straight-forward (but hacky) solution is of course

Was just checking in case I wasn't seeing something less hacky, but let's go with this now 👍 . I've added this method in src/compat/reversediff.jl to avoid a ReverseDiff dependency outside that submodule, a bit annoying though as it is not grouped with the rest of the methods.

we can easily run into matrices which end up not being hermitian due to numerical issues. If this happens, it seems to me that we'd want to error rather than just continue, no?

Should we have something like an ishermitian check but with tolerances? The ForwardDiff issue was numerical but tiny as far as I tested it.

src/compat/reversediff.jl Outdated Show resolved Hide resolved
# NOTE: `CorrBijector` technically isn't bijective, and so the default `getjacobian`
# used in the ChangesOfVariables.jl tests will fail as the jacobian will have determinant 0.
# Hence, we disable those tests.
test_bijector(b, x; test_not_identity=d != 1, changes_of_variables_test=false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InverseFunctions.test_inverse that is called in here fails on some calls, but passes on others, even with the same sample x. CI on previous commits of this PR failed because of it. Seems weird, not sure why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harisorgn do we generate a random number somewhere inside the tests? If so, maybe fix the random seed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no RNG between these lines, it was after sampling from LKJ. I believe the issue was numerical as there was no explicit zero-filling in the old Matrix-version link function, which caused mismatches. I've added zero-filling and tested it a bunch locally, should be fine 🤞

@harisorgn
Copy link
Member

Do we want to keep CorrBijector in general? Will it have a purpose after this PR?

@torfjelde
Copy link
Member Author

Do we want to keep CorrBijector in general? Will it have a purpose after this PR?

Good question. I'd say keep it for now + make an issue.

@yebai yebai merged commit 3a0b7e3 into master Jun 12, 2023
@delete-merged-branch delete-merged-branch bot deleted the torfjelde/vec-corr branch June 12, 2023 11:30
@yebai
Copy link
Member

yebai commented Jun 12, 2023

Great work -- thanks @harisorgn, @torfjelde and @devmotion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants